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SPATIAL MODELS GENERATED BY NESTED STOCHASTIC 
PARTIAL DIFFERENTIAL EQUATIONS, WITH AN APPLICATION 
TO GLOBAL OZONE MAPPING 

By David Bolin and Finn Lindgren 

Lund University 

A new class of stochastic field models is constructed using nested 
stochastic partial differential equations (SPDEs). The model class 
is computationally efficient, applicable to data on general smooth 
manifolds, and includes both the Gaussian Matern fields and a wide 
family of fields with oscillating covariance functions. Nonstationary 
covariance models are obtained by spatially varying the parameters 
in the SPDEs, and the model parameters are estimated using direct 
numerical optimization, which is more efficient than standard Markov 
Chain Monte Carlo procedures. The model class is used to estimate 
daily ozone maps using a large data set of spatially irregular global 
total column ozone data. 

1. Introduction. Building models for spatial environmental data is a 
challenging problem that has received much attention over the past years. 
Nonstationary covariance models are often needed since the traditional sta- 
tionary assumption is too restrictive for capturing the covariance structure in 
many problems. Also, many environmental data sets today contain massive 
amounts of measurements, which makes computational efficiency another 
increasingly important model property. One such data set, which will be an- 
alyzed in this work, is the the Total Ozone Mapping Spectrometer (TOMS) 
atmospheric ozone data [McPeters et al. (1996)]. The data was collected 
by a TOMS instrument onboard the the near-polar, Sun-synchronous or- 
biting satellite Nimbus-7, launched by NASA on October 24, 1978. During 
the sunlit portions of the satellite's orbit, the instrument collected data in 
scans perpendicular to the orbital plane. A new scan was started every eight 
seconds as the spacecraft moved from south to north. A number of recent 
papers in the statistical literature [Cressie and Johannesson (2008), Jun and 
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Stein (2008), Stein (2007)] have studied the data, and it requires nonstation- 
ary covariance structures as well as efficient computational techniques due 
to the large number of observations. 

A covariance model that is popular in environmental statistics is the 
Matern family of covariance functions [Matern (I960)]. The Matern covari- 
ance function has a shape parameter, u, a scale parameter, k, and a variance 1 
parameter, (p 2 , and can be parametrized as 

< u » c ' h »° ^|^W Mh|ir ^ (4h|l) - h6R "' 

where K v is a modified Bessel function of the second kind of order v > 0. One 
drawback with defining the model directly through a covariance function, 
such as (1.1), is that it makes nonstationary extensions difficult. Another 
drawback is that, unless the covariance function has compact support, the 
computational complexity for calculating the Kriging predictor based on n 
measurements is 0(n 3 ). This makes the Matern covariance model computa- 
tionally infeasible for many environmental data sets. 

Recently, Lindgren, Rue and Lindstrom (2010) derived a method for ex- 
plicit, and computationally efficient, Markov representations of the Matern 
covariance family. The method uses the fact that a random process on R d 
with a Matern covariance function is a solution to the stochastic partial 
differential equation (SPDE) 

(1.2) (k 2 -A) q / 2 X(s) = 0W(s), 

where W(s) is Gaussian white noise, A is the Laplace operator, and a = 
v + d/2 [Whittle (1963)]. Instead of defining Matern fields through the co- 
variance functions (1.1), Lindgren, Rue and Lindstrom (2010) used the so- 
lution to the SPDE (1.2) as a definition. This definition is valid not only 
on M. d but also on general smooth manifolds, such as the sphere, and facili- 
tates nonstationary extensions by allowing the SPDE parameters k 2 and (j) 
to vary with space. The Markov representations were obtained by consider- 
ing approximate stochastic weak solutions to the SPDE; see Section 3 for 
details. 

In this paper we extend the work by Lindgren, Rue and Lindstrom (2010) 
and construct a new flexible class of spatial models by considering a gen- 
eralization of (1.2). This model class contains a wide family of covariance 
functions, including both the Matern family and oscillating covariance func- 
tions, and it maintains all desirable properties of the Markov approximated 
Matern model, such as computational efficiency, easy nonstationary exten- 
sions and applicability to data on general smooth manifolds. 



x With this parametrization, the variance C(0) is (j> 2 {'i-K)~ d/2 Y{v)Y{v + d/2)' 1 k~ 2v . 
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The model class is introduced in Section 2, with derivations of some ba- 
sic properties, examples of covariance functions that can be obtained from 
these models and a discussion on nonstationary extensions. Section 3 gives 
a review of the Hilbert space approximation technique and shows how it can 
be extended to give computationally efficient representations also for this 
new model class. In Section 4 a numerical parameter estimation procedure 
for the nested SPDE models is presented, and the section concludes with a 
discussion on computational complexity for parameter estimation and Krig- 
ing prediction. In Section 5 the model class is used to analyze the TOMS 
ozone data. In particular, all measurements available from October 1st, 1988 
in the spatially and temporally irregular "Level 2" version of the data set 
are used. This data set contains approximately 180,000 measurements, and 
the nonstationary version of the model class is used to construct estimates 
of the ozone field for that particular day. Finally, Section 6 contains some 
concluding remarks and suggestions for further work. 

2. Stationary nested SPDE models. A limitation with the Matern co- 
variance family is that it does not contain any covariance functions with 
negative values, such as oscillating covariance functions. One way of con- 
structing a larger class of stochastic fields is to consider a generalization of 
the SPDE (1.2): 

(2.1) £iX(s) = £ 2 W(s), 

for some linear operators C\ and C 2 - If £-1 and C 2 are commutative opera- 
tors, (2.1) is equivalent to the following system of nested SPDEs: 

£iX (s) = W(s), 

(2.2) 

X(s)=C 2 X (s). 

This representation gives us an interpretation of the consequence of the addi- 
tional differential operator C 2 : X(s) is simply C 2 applied to the solution one 
would get to (2.1) if C 2 was the identity operator. Equation (2.1) generates 
a large class of random fields, even if the operators C\ and £2 are restricted 
to operators closely related to (1.2). One of the simplest extensions of the 
Matern model is to let L\ be the same as in (1.2) and use C 2 = (6 + B T V), 
where V is the gradient, b € K, and B 6 M. d . The equation then is 

(2.3) {k 2 - A) a / 2 X{s) = (6 + B T V)W(s), 

and X(s) is a weighted sum of a Matern field and its directional derivative 
in the direction determined by the vector B. This model is closely related 
to the models introduced in Jun and Stein (2007) and Jun and Stein (2008), 
and the connection is discussed later in Section 5. To get a larger class of 
models, the orders of the operators C\ and C 2 can be increased, and to get a 
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class of stochastic fields that is easy to work with, the operators are written 
as products, where each factor in the product is equal to one of the operators 
in (2.3). Thus, let 

m 

(2.4) A=n^ 2 - A r /2 

i=l 

for a, G N and nf > 0, and use 

(2.5) £ 2 = H(b l + BjV) 

i=l 

for bi 6 M and Bj £ M. d . Hence, the SPDE generating the class of nested 
SPDE models is 

(2.6) (j[(K 2 - A)^ X(s) = (jj(& 4 + BjV)^ W(s). 

There are several alternative equations one might consider; one could, for 
example, let £2 be on the same form as £%, or allow for anisotropic operators 
on the form (1 - V T AV) for some positive definite matrix A. However, to 
limit our scope, we will from now on only consider model (2.6). 

2.1. Properties in ]R rf . In this section some basic properties of random 
fields generated by (2.6), when s G W 1 , are given. First note that all Matern 
fields with shape parameters satisfying v + d/2 G N are contained in the 
class of stochastic fields generated by (2.6) since (k 2 — A) a//2 can be written 
on the form (2.4) for these values of v. Also note that the order of the 
operator £2 cannot be larger than the order of £\ if X(s) should be at 
least as "well behaved" as white noise; hence, one must have Y17=i a i — n 2- 
The smoothness of X(s) is determined by the difference of the orders of the 
operators £\ and £2- In order to make a precise statement about this, the 
spectral density of X(s) is needed. 

Proposition 2.1. The spectral density for X(s) defined by (2.6) is 
given by 



S-(k) 



2 n^iflj+^BjBjk) 

( 2 ^) d n"=iK- + l|k|| 2 ) aj ' 



This proposition is easily proved using linear filtering theory [see, for 
example, Yaglom (1987)]. Given the spectral density of X(s), the following 
proposition regarding the sample function regularity can be proved using 
Theorem 3.4.3 in Adler (1981). 
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Proposition 2.2. X(s) defined by (2.6) has almost surely continuous 
sample functions if 2 Ylt=l Q * ~~ ^ n 2 > ^' 

Because the stochastic field X(s) is generated by the SPDE (2.6), the fol- 
lowing corollary regarding sample path differentiability is also easily proved 
using the fact that the directional derivative of X(s) is in the class of nested 
SPDE models. 

Corollary 2.3. Given that 2^™J =1 aj — 2n2 — d> m, the mth order 
directional derivative of X(s), (B T X7) m X(s), has almost surely continuous 
sample functions. 

Hence, as 2^™1 1 ai — 2ri2 increases, the sample paths become smoother, 
and eventually become differentiable, twice differ entiable, and so on. One 
could also give a more precise characterization of the sample path regularity 
using the notion of Holder continuity. This is (more or less) straightforward 
using properties of index-/3 random fields [Adler (1981)], but outside the 
scope of this article. 

A closed-form expression for the covariance function is not that interesting 
since none of the methods that are later presented for parameter estimation, 
spatial prediction or model validation require an expression for the covari- 
ance function; however, if one were to use some technique that requires the 
covariance function, it can be derived. An expression for the general case 
is quite complicated, and will not be presented here. Instead we present a 
recipe for calculating the covariance function for given parameters of the 
SPDE, with explicit results for a few examples. 

To calculate the covariance function of A(s), first calculate the covariance 
function, Cx (h), of Ao(s), given by (2.2). Given this covariance function, 
the covariance function for X(s) is obtained as 



The motivation for this expression is again directly from linear filter theory, 
and the (i-dimensional equivalent of the formula for the covariance function 
for a differentiated stochastic process, rx'( T ) = ~ r x( T )- ^° S e t an expression 
for Cx (h), first use Proposition 2.1 with £2 = I to get the spectral density 
of Ao(s). Using a partial fraction decomposition, the spectral density can 
be written as 




(2.7) 
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Fig. 1. Covariance functions of random fields obtained from model (2.6) with parameters 
from Example 1 (top left), Example 2 (top middle and right), Example 3 (bottom left and 
middle) and Example 4 (bottom right). 



where pij is a real constant which can be found using the Heaviside cover- 
up method [see, for example, Thomas and Finney (1995), page 523]. Now, 
by taking the inverse Fourier transform of (2.7), the covariance function for 
X (s) is 

i=l j=l 

where C£(h) denotes a Matern covariance function with shape parameter 
v, scale parameter k and variance parameter qt> 2 . The final step is to use 
that the derivative of a Matern covariance function can be expressed using 
a Matern covariance with another shape parameter. More precisely, one has 

JL cah ) = -£ cr '(h), 

where hi denotes the zth component of the vector h. Using these calcula- 
tions, one can obtain the covariance function for any field given by (2.6). 
We conclude this section by showing the covariance function for some simple 
cases in M 2 . The covariance functions for these examples are shown in Fig- 
ure 1, and realizations of Gaussian processes with these covariance functions 
are shown in Figure 2. 



SPATIAL MODELS GENERATED BY NESTED SPDES 



7 



♦ r >/•- * 

Fig. 2. Realizations of random fields obtained from model (2.6) with different parameters. 
The realization in each panel corresponds to a stochastic field with the covariance function 
shown in the corresponding panel in Figure 1. 

Example 1. With L\ = (k 2 - A)"/ 2 and C 2 as the identity operator, 
the standard Matern covariance function (1.1) is obtained, shown in the top 
left panel of Figure 1 . 

Example 2. The simplest nested SPDE model (2.3) has the covariance 
function 

C(h) = bC»(h) + ^^(h) - ^^C^(h). 

A stochastic field with this covariance function is obtained as a weighted sum 
of a Matern field -Xo(s) and its directional derivative in the direction of B. 
The field therefore has a Matern-like behavior in the direction perpendicular 
to B and an oscillating behavior in the direction of B. In the upper middle 
panel of Figure 1, this covariance function is shown for B = (1,0) T , v = 3, 
and 6 = 5. In the upper right panel of Figure 1, it is shown for B = (1,0) T , 
v = 3, and b = 0. 

Example 3. The number of zero crossings of the covariance function in 
the direction of B is at most n 2 . In the previous example we had n 2 = 1, 
and to obtain a more oscillating covariance function, the order of C 2 can be 
increased by one: 

(k 2 - A) a / 2 X(s) = (h + B[V)(6 2 + BjV)W(s). 
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This model has the covariance function 

c ( h) = fekczch) + 6aBfBl 2 ^ B ' Ba cr i (h) 

2(B 2 r B 1 ) 2 + B7BxBjB 2 - h T (6!B 2 Bj + fegigTjh 2 

+ 2M^-i) (h) 

h T (B 1 B 2 r B 2 B7 + 4BXB7B2BJ + BggTgiBjjh 3 

2 3 i/(z>-l)(^-2) 1 j 

(B7hbJB 2 ) 2 

^2 4 ^( I /-l)(zy-2)(i/-3) K W " 

In the bottom left panel of Figure 1 this covariance function is shown for v = 
5, b\ = 6 2 = and Bi = B 2 = (1, 0) T . With these parameters, the covariance 
function is similar to the covariance function in the previous example, but 
with one more zero crossing in the direction of B. For this specific choice of 
parameters, the expression for the covariance function can be simplified to 

c(h) = 3 72 cr 2 (h) - e^hjc^ih) + 74 ^cr 4 (h), 

where 7 fc = (2*II*~q (1/ — In the bottom middle panel of Figure 1 the 

covariance function is shown for v = 5, 61 = 6 2 = 0, Bi = (1,0) T , and B 2 = 
(0, 1) T . Thus, the field Xo(s) is differentiated in two different directions, and 
the covariance function for X(s) therefore is oscillating in two directions. For 
these parameters, the covariance function can be written as 

c(h) = 72 cr 2 (h) - 73 h T hcr 3 (h) + 74 / il / l2 cr 4 (h). 

Example 4. The bottom right panel of Figure 1 shows a covariance 
function for the nested SPDE 

(k 2 - A) a ^X(s) = (bx + B7V) 2 (6 2 + B 2 r V) 2 W(s). 

As in the previous examples, the covariance function for a stochastic field 
generated by this SPDE can be calculated and written on the form 

c(h)=^ 7fc / fc (h)cr fc (h), 

fc=0 

where fi-(h),k = 0, . . . , 8, are functions depending on h and the parameters in 
the SPDE. Without any restrictions on the parameters, it is a rather tedious 
exercise to calculate the functions /fe(h), and we therefore only show them 
for the specific set of parameters that are used in Figure 1: v = 7, b\ = 6 2 = 0, 
Bx = (1,0) T and B 2 = (0, 1) T . In this case / (h) = /i(h) = / 2 (h) = 0, and 
the covariance function is 

c(h) = 9 74 cr 4 (h) - i8 75 h T hcr 5 (h) + 3 76 (/i 4 + hi + i2^!)cr 6 (h) 

- Q l7 hlh 2 2 h T hC v K - 7 (h) + 1% h\h\C»-\h). 
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2.2. Nonstationary nested SPDE models. Nonstationarity can be intro- 
duced in the nested SPDE models by allowing the parameters Ki, hi and Bj 
to be spatially varying: 



\i=l / 

If the parameters are spatially varying, the two operators are no longer 
commutative, and the solution to (2.8) is not necessarily equal to the solution 



For nonstationary models, we will from now on only study the system of 
nested SPDEs (2.8), though it should be noted that the methods presented 
in the next sections can be applied to (2.9) as well. 

One could potentially use an approach where the spatially varying pa- 
rameters also are modeled as stochastic fields, but to be able to estimate 
the parameters efficiently, it is easier to assume that each parameter can be 
written as a weighted sum of some known regression functions. In Section 5 
this approach is used for a nested SPDE model on the sphere. In this case, 
one needs a regression basis {^(s)} for the vector fields Bj(s) on the sphere. 
Explicit expressions for such a basis are given in the Appendix. 

3. Computationally efficient representations. In the previous section co- 
variance functions for some examples of nested SPDE models were derived. 
Given the covariance function, standard spatial statistics techniques can 
be used for parameter estimation, spatial prediction and model simulation. 
Many of these techniques are, however, computationally infeasible for large 
data sets. Thus, in order to use the model for large environmental data sets, 
such as the ozone data studied in Section 5, a more computationally efficient 
representation of the model class is needed. In this section the Hilbert space 
approximation technique by Lindgren, Rue and Lindstrom (2010) is used to 
derive such a representation. 

The key idea in Lindgren, Rue and Lindstrom (2010) is to approximate the 
solution to the SPDE C\Xq(s) = W(s) in some approximation space spanned 
by basis functions (p± (s) , . . . , (p n (s) . The method is most efficient if these basis 
functions have compact support, so, from now on, it is assumed that {fi} 
are local basis functions. The weak solution of the SPDE with respect to 




of 



(2.9) 
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the approximation space can be written as x(s) = Y27=i w i i Pi( s )^ where the 
stochastic weights {wj}" =1 are chosen such that the weak formulation of the 
SPDE is satisfied: 

(3.1) [(<Pi,£ix)n]i=i,...,n = [(tpi,W)n]i=i,...,n- 

Here = denotes equality in distribution, Q, is the manifold on which s is 
defined, and (f,g)n = J n /(s)g(s) ds is the scalar product on Q. As an illus- 
trative example, consider the first fundamental case L\ = k 2 — A. One has 

n 

(tpi,C\x)o = y^ j Wj((p i ,£ 1 (pj)n, 
j=i 

so by introducing a matrix K, with elements K» » = (ipi,Ci<pj)n, and the 
vector w = (w\, . . . ,w n ) T , the left-hand side of (3.1) can be written as Kw. 
Since 

{<Pi,£i<Pj)n = K 2 (tpi,(pj)n - (<pi,A<pj)n 

= K 2 (ipi,(fj)n + {V(pi,V(pj)n, 

the matrix K can be written as K = k 2 C + G, where Cjj = {<fi,tpj)n and 
Gij = (S/(fi,Vipj)n. The right-hand side of (3.1) can be shown to be Gaus- 
sian with mean zero and covariance C. For the Hilbert space approxima- 
tions, it is natural to work with the canonical representation, x ~ Nc*(b, Q), 
of the Gaussian distribution. Here, the precision matrix Q is the inverse 
of the covariance matrix, and the vector b is connected to the mean, /x, 
of the Gaussian distribution through the relation /x = Q _1 b. Thus, if K is 
invertible, one has 

Kw~N c (0,C^ 1 ) w~ N c (0,KC" 1 K). 

For the second fundamental case, C\ = (k 2 — A) 1 / 2 , Lindgren, Rue and Lind- 
strom (2010) show that w~ Nc*(0,K). Given these two fundamental cases, 
the weak solution to C\Xq(s) = W(s), for any operator on the form (2.4), 
can be obtained recursively. If, for example, C± = (k 2 — A) 2 , the solution is 
obtained by solving (k 2 — A)Xq(s) = x(s), where x is the weak solution to 
the first fundamental case. 

The iterative way of constructing solutions can be extended to calculate 
weak solutions to (2.6) as well. Let xq = Y^7=i u, i > ¥ , «( s ) be a weak solution 
to £iXq(s) = W(s), and let Qx denote the precision for the weights wo = 
(wi, . . . ,Wn) T ■ Substituting Xq with xq in the second equation of (2.1), the 
weak formulation of the equation is 

[{tpi,x)n]i=i,... >n = [(</??, £2£o)n]i=i,...,n 

(3-2) 

n 
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First consider the case of an order-one operator £2 = b\ + B^V. By intro- 
ducing the matrix Hi with elements Hijj = (ifi,C2<fj)ci, the right-hand side 
of (3.2) can be written as Hiwo. Introducing the vector w = {w-y, . . . ,w n ) T , 
the left-hand side of (3.2) can be written as Cw, and one has 

w = C _1 Hiw => w~ N c (0,CH5- T Q Xo Hj- 1 C). 

Now, if £2 is on the form (2.5), the procedure can be used recursively, in 
the same way as when producing higher order Matern fields. For example, if 

£ 2 = (6 1 + B7V)(6 2 + BJV), 

the solution is obtained by solving X(s) = (6 2 + B 2 r V)5(s), where x is the 
weak solution to the previous example. Thus, when £2 is on the form (2.5), 
one has 

w ~ Nc^H-Tq^HT 1 ), H = C^KmC- 1 !^-! • • • C- 1 ^, 

where each factor Hj corresponds to the H-matrix obtained in the ith step 
in the recursion. 



3.1. Nonstationary fields. As mentioned in Lindgren, Rue and Lind- 
strom (2010), the Hilbert space approximation technique can also be used 
for nonstationary models, and the technique extends to the nested SPDE 
models as well. One again begins by finding the weak solution of the first 
part of the system, Ci(s)Xq(s) = W(s). The iterative procedure is used for 
obtaining approximations of high-order operators, so the fundamental step 
is to find the weak solution to the equation when C\ = (k 2 (s) — A). Consider 
the weak formulation 

(3.3) [fa, (k 2 (s) - A)i) n ] i=1 _ n £ [fa, W)n]i=i,...,n ) 

and note that the right-hand side of the equation is the same as in the 
stationary case, Nc(0,C _1 ). Now, using that 

(ipi, (k 2 (s) - A)x) n = {ip i ,K 2 (s)x)n - {tpi, Ax)n 

= {<pi,K 2 (s)x)n + {V(pi,Vx)n, 

the left-hand side of (3.3) can be written as (C + G)wo, where G and wo 
are the same as in the stationary case and C is a matrix with elements 

Cij = (ip i ,K 2 (s)(p j )Q = / K 2 (s)<^(s)</?j(s)ds 

Jq 

(3.4) 

w« 2 (sj) / ipi(s)ipj(s)ds = K 2 (s j )C i) j. 
Jn 
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Since {(fit} is assumed to be a local basis, such as B-spline wavelets or some 
other functions with compact support, the locations Sj can, for example, be 
chosen as the centers of the basis functions (fj(s). The error in the approxi- 
mation of C is then small if k 2 (s) varies slowly compared to the spacing of 
the basis functions ipj. Prom equation (3.4), one has C = Ck, where k is 
a diagonal matrix with elements Kjj = k 2 (sj). Finally, with K = kC + G, 
one has 

Kw ~ N c (0,C" 1 ) => w ~ Nc(0,KC _1 K). 

Now given the weak solution, xq, to £i(s)Jfo(s) = W(s), substitute Xq with 
xq in the second equation of (2.2) and consider the weak formulation of 
the equation. Since the solution to the full operator again can be found 
recursively, only the fundamental case £2 = b(s) + B(s) T V is considered. 
The weak formulation is the same as (3.2), and one has 

(ipi,x)n = (<Pi,C2Xo)n = {<Pi, (b(a) + B(s) T V)x ) n 
= (ipi,b(s)x )n + (ipi,B(s) T Vx )n- 
Thus, the right-hand side of (3.2) can be written as (C + H)wo, where 

Cij = (<Pi,b(s)(pj)si= / b(s)(pi(s)(pj(s)dsKib(sj)Ci i j, 
Jn 

ilij = {(Pi,B(s) T Vip j ) n = / ^(s)B(s) T Vv3j(s)ds 

Jn 



B(s j ) T / (pi(s)Vipj(s)ds. 
Jn 



Here, similar approximations as in equation (3.4) are used, so the expressions 
are accurate if the coefficients vary slowly compared to the spacing of the 
basis functions <pj. The left-hand side of (3.2) can again be written as Cw, 
so with Hi = C + H, one has w ~ N c (0, CH^ T Q Xo H^ 1 C). 

3.2. Practical considerations. The integrals that must be calculated to 
get explicit expressions for the matrices C, G and H are 

/ <pi(s)<pj(s)ds, / (Vi^(s)) T V(^(s)ds and / ^(s)V^j(s) ds. 
Jn Jn Jn 

In Section 5 a basis of piecewise linear functions induced by a triangulation 
of the Earth is used; see Figure 4. In this case, <pi(s) is a linear function 
on each triangle, and V</?j(s) is constant on each triangle. The integrals, 
therefore, have simple analytic expressions in this case, and more generally 
for all piecewise linear bases induced by triangulated 2-manifolds. 
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Bases induced by triangulations have many desirable properties, such as 
the simple analytic expression for the integrals and compact support. They 
are, however, not orthogonal, which causes C _1 to be dense. The weights w, 
therefore, have a dense precision matrix, unless C -1 is approximated with 
some sparse matrix. This issue is addressed in Lindgren, Rue and Lindstrom 
(2010) by lowering the integration order of (<Pi,<Pj), which results in an ap- 
proximate, diagonal C matrix, C, with diagonal elements Ca = Y^=i Qk- 
Bolin and Lindgren (2009) perform numerical studies on how this approx- 
imation affects the resulting covariance function of the process, and it is 
shown that the error is small if the approximation is used for piecewise lin- 
ear bases. We will, therefore, from now on use the approximate C matrix in 
all places where C is used. 

A natural question is how many basis functions one should use in order 
to get a good approximation of the solution. The answer will depend on 
the chosen basis, and, more importantly, on the specific parameters of the 
SPDE model. Bolin and Lindgren (2009) study the approximation error in 
the Matern case in M and M 2 for different bases, and in this case the spacing 
of the basis functions compared to the range of the covariance function for 
X(s) determines the approximation error: For a process with long range, 
fewer basis functions have to be used than for a process with short range to 
obtain the same approximation error. For more complicated, possibly non- 
stationary, nested SPDE models, there is no easy answer to how the number 
of basis functions should be chosen. Increasing the number of basis functions 
will decrease the approximation error but increase the computational com- 
plexity for the approximate model, so there is a trade-off between accuracy 
and computational cost. However, as long as the parameters vary slowly 
compared to the spacing of the basis functions, the approximation error will 
likely be much smaller than the error obtained from using a model that 
does not fit the data perfectly and from estimating the parameters from the 
data. Thus, for practical applications, the error in covariance induced by the 
Hilbert space approximation technique will likely not matter much. A more 
important consequence for practical applications when the piecewise linear 
basis is used is that the Kriging estimation of the field between two nodes in 
the triangulation is a linear interpolation of the values at the nodes. Thus, 
variations on a scale smaller than the spacing between the basis functions 
will not be captured correctly in the Kriging prediction. For practical ap- 
plications, it is therefore often best to choose the number of basis functions 
depending on the scale one is interested in the Kriging prediction on. 

For the ozone data in Section 5, the goal is to estimate daily maps of 
global ozone. As we are not interested in modeling small scale variations, 
we choose the number of basis functions so that the mean distance between 
basis functions is about 258 km. For this basis, the smallest distance between 
two basis functions is 222 km, and the largest distance is about 342 km. 
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Fig. 3. Parameter estimates for the covariance parameters in model F 1 for the ozone 
data as functions of the number of basis functions in the Hilbert space approximations. 

Estimating the model parameters using different numbers of basis func- 
tions will give different estimates, as the parameters are estimated to max- 
imize the likelihood for the approximate model instead of the exact SPDE. 
An example of this can be seen in Figure 3 where the estimates of the co- 
variance parameters for model F' (see Section 5 for a model description) for 
the ozone data are shown for varying numbers of basis functions. Instead of 
showing the actual parameter estimates, the figure shows the differences be- 
tween the estimates and the estimate when using the basis shown in Figure 
4, which has 9002 basis functions. Increasing the number of basis functions 
further, the estimates will finally converge to the estimates one would get 
using the exact SPDE representation. The curve that has not converged cor- 
responds to the dominating parameter in the vector field. Together with k, 
this parameter controls the correlation range of the ozone field. 

4. Parameter estimation. In this section a parameter estimation proce- 
dure for the nested SPDE models is presented. One alternative would be 
to use a Metropolis-Hastings algorithm, which is easy to implement, but 
computationally inefficient. A better alternative is to use direct numerical 
optimization to estimate the parameters. 

Let Y(s) be an observation of the latent field, X(s), given by (2.6) or (2.8), 
under mean zero Gaussian measurement noise, £ (s), with variance a 2 : 

(4.1) Y( S )=X(s) + £(s). 

Using the approximation procedure from Section 3, and assuming a regres- 
sion model for the latent field's mean value function, //(s), the measurement 
equation can then be written as 

Y = M/i + *w + e, 

where M is a matrix with the regression basis functions evaluated at the 
measurement locations, and \x is a vector containing the regression coeffi- 
cients that have to be estimated. The matrix $ contains the basis functions 
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for the Hilbert space approximation procedure evaluated at the measure- 
ment locations, and w is the vector with the stochastic weights. In Section 3 
it was shown that the vector w is Gaussian with mean zero and covariance 
matrix HQ^H T . Both Q Xo and H are sparse matrices, but neither the 
covariance matrix nor the precision matrix for w is sparse. Thus, it would 
seem as if one had to work with a dense covariance matrix, which would 
make maximum likelihood parameter estimation computationally infeasible 
for large data sets. However, because of the product form of the covari- 
ance matrix, one has that w = Hwo, where wo ~ Nc(0, Qxo)- Hence, the 
observation equation can be rewritten as 

(4.2) Y = M/i + $Hw + £. 

Interpreting A = <1?H as an observation matrix that depends on some of the 
parameters in the model, Y — M/x can now be seen as noisy observations of 
wo, which has a sparse precision matrix. The advantage with using (4.2) is 
that one then is in the setting of having observations of a latent Gaussian 
Markov random field, which facilitates the usage of sparse matrix techniques 
in the parameter estimation. 

Let ip denote all parameters in the model except for fi. Assuming that fi 
and ip are a priori independent, the posterior density can be written as 

7r(wo,/x,y>|Y) oc7r(Y|w o ,cj 2 )7r(wo|/x,'0)7r(/z)7r(i/j). 

Using a Gaussian prior distribution with mean fi and precision for the 
mean parameters, the posterior distribution can be reformulated as 

(4.3) 7r(w , /x, M Y) oc vr(w |^, V>, Y)tt(m|^, Y)tt(^| Y), 
where w |/x,i/>, Y ~ N c (b, Q), Y ~ Nc(b M , Q M ), and 

1 .t,„ _ _. * , n M T Y M T AQ~ 1 A T Y 
b=— A T Y-M/x, = Q^m^ + — { , 

A _ 1 aTa a ^ M T M M AQ 'A M 

Q = Q™ + — A T A, Q M = Qm + 2 ~4 • 

The calculations are omitted here since these expressions are calculated simi- 
larly to the posterior reformulation in Lindstrom and Lindgren (2008), which 
gives more computational details. Finally, the marginal posterior density 
7v(ij)\~Y) can be shown to be 

IQI 1/2 IQ„l 1/2 kl| V 2 " V " 2 ) 2 

By rewriting the posterior as (4.3), it can be integrated with respect to wo and 
/x, and instead of optimizing the full posterior with respect to Wo, £t and ip, on- 
ly the marginal posterior 7r(?/?|Y) has to be optimized with respect to rj}. This 
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is a lower dimensional optimization problem, which substantially decreases 
the computational complexity. Given the optimum, ip opt = argmax^, ir(ij)\Y), 
fi opt is then given by fi opt = Q~ 1 b At . In practice, the numerical optimization 
is carried out on log 7r (■*/>! Y). 

4.1. Estimating the parameter uncertainty. There are several ways one 
could estimate the uncertainty in the parameter estimates obtained by the 
parameter estimation procedure above. The simplest estimate of the uncer- 
tainty is obtained by numerically estimating the Hessian of the marginal 
posterior evaluated at the estimated parameters. The diagonal elements of 
the inverse of the Hessian can then be seen as estimates of the variance for 
the parameter estimates. 

Another method for obtaining more reliable uncertainty estimates is to 
use a Metropolis-Hastings based MCMC algorithm with proposal kernel 
similar to the one used in Lindstrom and Lindgren (2008). A quite efficient 
algorithm is obtained by using random walk proposals for the parameters, 
where the correlation matrix for the proposal distribution is taken as a 
rescaled version of the inverse of the Hessian matrix [Gelman, Roberts and 
Gilks (1996)]. 

Finally, a third method for estimating the uncertainties is to use the INLA 
framework [Rue, Martino and Chopin (2009)], available as an R package 
(http://www.r-inla.org/). In settings with latent Gaussian Markov ran- 
dom fields, integrated nested Laplace approximations (INLA) provide close 
approximations to posterior densities for a fraction of the cost of MCMC. For 
models with Gaussian data, the calculated densities are for practical pur- 
poses exact. In the current implementation of the INLA package, handling 
the full nested SPDE structure is cumbersome, so further enhancements are 
needed before one can take full advantage of the INLA method for these 
models. 

4.2. Computational complexity. In this section some details on the com- 
putational complexity for the parameter estimation and Kriging estimation 
are given. 

The most widely used method for spatial prediction is linear Kriging. In 
the Bayesian setting, the Kriging predictor simply is the posterior expec- 
tation of the latent field X given data and the estimated parameters. This 
expectation can be written as 

E(X\tp, /x, Y) = M/i + *HE(w ) = Mfi + *HQ x b. 

The computationally demanding part of this expression is to calculate Q _1 b. 
Since the n x n matrix Q is positive definite, this is most efficiently done 
using Cholesky factorization, forward substitution and back substitution: 
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Calculate the Cholesky triangle L such that Q = LL T , and given L, solve 
the linear system Lx = b. Finally, given x, solve L T y = x, where now y 
satisfies y = Q _1 b. Solving the forward substitution and back substitution 
are much less computationally demanding than calculating the Cholesky tri- 
angle. Hence, the computational cost for calculating the Kriging prediction 
is determined by the cost for calculating L. 

The computational complexity for the parameter estimation is determined 
by the optimization method that is used and the computational complex- 
ity for evaluating the marginal log-posterior log7r(?/>|Y). The most com- 
putationally demanding terms in log7r(V'|Y) are the two log-determinants 
l°g|Qiool an d log | Q | and the quadratic form YAQ _1 AY, which are also 
most efficiently calculated using Cholesky factorization. Given the Cholesky 
triangle L, the quadratic form can be obtained as x T x, where x is the 
solution to Lx = AY, and the log-determinant log|Q| is simply the sum 2 
2 Yli=i 1°S ■kjj. Thus, the computational cost for one evaluation of the margi- 
nal posterior is also determined by the cost for calculating L. Because of the 
sparsity structure of Q, this computational cost is 0(n), 0(n 3 / 2 ) and 0(n 2 ) 
for problems in one, two and three dimensions respectively [see Rue and 
Held (2005) for more details]. 

The computational complexity for the parameter estimation is highly 
dependent on the optimization method. If a Broyden-Fletcher-Goldfarb- 
Shanno (BFGS) procedure is used without an analytic expression for the 
gradients, the marginal posterior has to be evaluated p times for each step 
in the optimization, where p is the number of covariance parameters in the 
model. Thus, if p is large and the initial value for the optimization is cho- 
sen far from the optimal value, many thousand evaluations of the marginal 
posterior may be needed in the optimization. 

5. Application: Ozone data. On October 24, 1978, NASA launched the 
near-polar, Sun-synchronous orbiting satellite Nimbus-7. The satellite car- 
ried a TOMS instrument with the purpose of obtaining high-resolution 
global maps of atmospheric ozone [McPeters et al. (1996)]. The instrument 
measured backscattered solar ultraviolet radiation at 35 sample points along 
a line perpendicular to the orbital plane at 3-degree intervals from 51 de- 
grees on the right side of spacecraft to 51 degrees on the left. A new scan 
was started every eight seconds, and as the measurements required sunlight, 
the measurements were made during the sunlit portions of the orbit as the 
spacecraft moved from south to north. The data measured by the satellite 



2 Since only the difference between the log-determinants is needed, one should imple- 
ment the calculation as 2j])"_ 1 (logI/^j — logL^j), where L™^ and are the diagonal 
elements of the Cholesky factors, sorted in ascending order, and the sum is ordered by 
increasing absolute values of the differences. This reduces numerical issues. 
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has been calibrated and preprocessed into a "Level 2" data set of spatially 
and temporally irregular Total Column Ozone (TCO) measurements follow- 
ing the satellite orbit. There is also a daily "Level 3" data set with values 
processed into a regular latitude-longitude grid. Both Level 2 and Level 3 
data have been analyzed in recent papers in the statistical literature [Cressie 
and Johannesson (2008), Jun and Stein (2008), Stein (2007)]. 

In what follows, the nested SPDE models are used to obtain statistical 
estimates of a daily ozone map using a part of the Level 2 data. In particular, 
all data available for October 1st, 1988 is used, which is the same data set 
that was used by Cressie and Johannesson (2008). 

5.1. Statistical model. The measurement model (4.1) is used for the 
ozone data. That is, the measurements, V(s), are assumed to be obser- 
vations of a latent field of TCO ozone, X(s), under Gaussian measurement 
noise £(&) with a constant variance a 2 . We let X(s) have some mean value 
function, /i(s), and let the covariance structure be determined by a nested 
SPDE model. Inspired by Jun and Stein (2008), who proposed using differ- 
entiated Matern fields for modeling TCO ozone, we use the simplest nested 
SPDE model. Thus, Z(s) = X(s) — fi(s) is generated by the system 

( K 2 (s)-A)Z (s) = W(s) 

Z(s) = (6(s)+B(s) T V)Z (s), 

where W(s) is Gaussian white noise on the sphere. If k(s) is assumed to be 
constant, the ozone is modeled as a Gaussian field with a covariance struc- 
ture that is obtained by applying the differential operator (6(s) +B(s) T V) 
to a stationary Matern field, which is similar to the model by Jun and Stein 
(2008). If, on the other hand, k is spatially varying, the range of the Matern- 
like covariance function can vary with location. As in Stein (2007) and Jun 
and Stein (2008), the mean can be modeled using a regression basis of spher- 
ical harmonics; however, since the data set only contains measurements from 
one specific day, it is not possible to identify which part of the variation in 
the data that comes from a varying mean and which part that can be ex- 
plained by the variance-covariance structure of the latent field. To avoid this 
identifiability problem, fi(s) is assumed to be unknown but constant. The 
parameter k 2 (s) has to be positive, and for identifiability reasons, we also 
require 6(s) to be positive. We, therefore, let logK 2 (s) = J2k m K k,mXk,m{ s ) 
and logft(s) = rn 6fc im lfc, m (s), where Yk >m is the spherical harmonic of or- 
der k and mode m. Finally, the vector field B(s) is modeled using the vector 
spherical harmonics basis functions T£ m and T| m , presented in Appendix: 

B(s) = J2( B l™ T im(s) + Bl rn Tl rn {s)). 

k,m 
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To choose the number of basis functions for the parameters k 2 (s), b(s) 
and B(s), some model selection technique has to be used. Model selec- 
tion for this model class is difficult since the models can have both non- 
stationary mean value functions and nonstationary covariance structures. 
This makes standard variogram techniques inadequate in general, and we 
instead base the model selection on Akaike's Information Criterion (AIC) 
and the Bayesian Information Criterion (BIC) [Hastie, Tibshirani and Fried- 
man (2001)], which are suitable model selection tools for the nested SPDE 
models since the likelihood for the data can be evaluated efficiently. 

We estimate 13 models with different numbers of covariance parameters, 
presented in Table 1. The simplest model is a stationary Matern model, 
with four parameters to estimate, and the most complicated model has 100 
parameters to estimate, including the mean and the measurement noise vari- 
ance. There are three different types of models in Table 1: In the first type 
(models B, E, G and J), k 2 and b are spatially varying and the vector field 
B is assumed to be zero. In the second type (models C, F, I and L), b and B 
are spatially varying and k 2 is assumed to be constant. Finally, in the third 
type (model D, H, K and M), all parameters are spatially varying. 

A basis of 9002 piecewise linear functions induced by a triangulation of the 
Earth (see Figure 4) is used in the approximation procedure from Section 3 
to get efficient representations of each model, and the parameters are esti- 
mated using the procedure from Section 4. The computational cost for the 
parameter estimation only depends on the number of basis functions in the 
Hilbert space approximation, and not on the number of data points, which 
makes inference efficient even for this large data set. 

AIC and BIC for each of the fitted models can be seen in Figure 5. The fig- 
ure contains one panel for each of the three model types and one panel where 
AIC and BIC are shown for all models at once. The major improvement in 
AIC and BIC occurs when the orders of the basis functions are increased 
from one to two. For the first model type, with spatially varying k 2 and b, 



Table 1 

Maximal orders of the spherical harmonics used in the bases for the different parameters 
and total number of covariance parameters in the different models for X(s) 





A 


B 


C 


D 


E 


F 


G 


H 
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J 
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L 


M 


K 2 (S) 





1 





1 


2 





3 


2 





4 


3 





4 


6(b) 





1 


1 


1 
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2 


3 
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3 


4 


3 


4 


4 


B(s) 








1 


1 





2 





2 


3 





3 


4 


4 


Total 


2 


8 


11 


14 


18 


20 


32 


34 


47 


50 


02 


75 


98 



Notes: The actual number of basis functions for re 2 (s) and b(s) are given by (ord + 1) 2 , and 
for B(s), the actual number is 2(ord + l) 2 — 2, where ord is the maximal order indicated 
in the table. 
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Fig. 4. The left part shows the triangulation of the Earth used to define the piecewise 
linear basis functions in the Hilbert space approximation for ozone data. Each basis func- 
tion is one at a node in the triangulation, and decreases linearly to zero at the neighboring 
nodes. The right part of the figure shows one of these functions. 

the figure indicates that the results could be improved by increasing the 
orders of the basis functions further. However, for a given order of the basis 
functions, the other two model types have much lower AIC and BIC. Also, 
by comparing AIC and BIC for the second and third model types, one finds 
that there is not much gain in letting k 2 be spatially varying. We therefore 
conclude that a model with spatially varying b and B is most appropriate 
for this data. 

The estimated parameters b(s) and the length of the vectors B(s) for 
model F are shown in Figure 6. One thing to note in this figure is that 
the two parameters are fairly constant with respect to longitude, which 
indicates that the latent field could be axially symmetric, an assumption 
that was made by both Stein (2007) and Jun and Stein (2008). If the latent 
field indeed was axially symmetric, one would only need the basis functions 
that are constant with respect to longitude in the parameter bases. Since 
there is only one axially symmetric spherical harmonic for each order, this 
assumption drastically reduces the number of parameters for the models in 
Table 1. Let A'-M' denote the axially symmetric versions of models A-M. 
For these models, the number of basis functions for both k 2 (s) and b(s) 
is ord + 1, and the number of basis functions for B(s) is 2(ord + 1) — 2, 
where ord is the maximal order indicated in Table 1. The dashed lines in 
Figure 5 show AIC and BIC calculated for these models. Among the axially 
symmetric models, model F' is surprisingly good considering that it only 
has 8 covariance parameters. 

The Kriging estimate and its standard error for model F' are shown in 
Figures 7 and 8 respectively. The oscillating behavior near the equator for the 
standard error is explained by the fact that the satellite tracks are furthest 
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Fig. 5. AIC (squares) and BIC (circles) for the models A-M (solid lines) and the axially 
symmetric models A'-M' (dashed lines), scaled by a factor 10 . Note that the major 
improvement in AIC and BIC occurs when the orders of the basis functions are increased 
from one to two, and that the model type with spatially varying b and B seems to be most 
appropriate for this data. Also note that the axially symmetric model F 1 is surprisingly 
good considering that it only has 8 covariance parameters. 



apart there, which results in sparser measurements between the different 
tracks. Because the measurements are collected using backscattered sunlight, 
the variance close to the north pole is high, as there are no measurements 
there. As seen in Figure 9, there is not much spatial correlation in the 
residuals X — Y, which indicates a good model fit. In Figure 10, estimates 
of the local mean and variance of the residuals are shown. The mean is fairly 
constant across the globe, but there is a slight tendency for higher variance 
closer to the poles. This is due to the fact that the data really is space-time 
data, as the measurements are collected during a 24-hour period. Since the 
different satellite tracks are closest near the poles, the temporal variation of 
the data is most prominent here, and especially near the international date 
line where data is collected both at the first satellite track of the day and 
at the last track, 24 hours later. The area with high residual variance is one 
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Fig. 6. Estimated variance-scaling parameter, b(s), and the the norm of the vectors in 
the estimated vector field B(s) for model F. Note that the estimates are fairly constant 
with respect to longitude, which indicates that the latent field could be axially symmetric. 

of those places where measurements are taken both at the beginning and 
the end of the time period, and where the ozone concentration has changed 
during the time period between the measurements. One could include this 
effect by allowing the variance of the measurement noise to be spatially 
varying; however, one should really use a spatio-temporal model for the 
data to correctly account for the effect, which is outside the scope of this 
article. 




Fig. 7. Kriging estimate of TCO ozone in Dobson units using model F . 



Table 2 

Estimates of the covariance parameters in model F using all data but the first track 
(Yf), all data but the last track (Yi), and all data (Y ) 





ft 


a 




b 2 


b 3 


Si 


B 2 


B 3 




Yf 


0.74 


25.60 


5.85 


0.045 


0.34 


1.05 


2.59 


-6.84 


-0.84 


Y, 


0.73 


25.56 


5.82 


0.033 


0.34 


0.90 


2.38 


-7.01 


-0.82 


Y 


0.67 


34.09 


5.75 


0.054 


0.36 


0.70 


2.48 


-7.10 


-0.68 
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Fig. 8. Standard error in Dobson units for the Kriging estimate. The color bar in the 
left part of the figure has been truncated at 6 Dobson units. The behavior near the north 
pole can be seen in the right part of the figure. 



To see how much the temporal structure near the international date line 
influences the model fit, the parameters in model F' are re-estimated without 
using the first satellite track of the day and without using the last track of 
the day. The estimated parameters can be seen in Table 2 and, as expected, 
the estimate of the measurement noise variance is much lower when not 
using all date line data. The estimates of the covariance parameters for 
the latent field also change somewhat, but the large scale structure of the 
nonstationarity is preserved. 

To study how sensitive the Kriging estimates are to the model choice, the 
ratio between the Kriging estimates for the simple model F' and the large 
model M, and the ratio between the corresponding Kriging standard errors, 
are shown in Figure 11. There is not much difference between the two Kriging 
estimates, whereas there is a clear difference between the corresponding 
standard errors. Thus, if one only is interested in the Kriging estimate, it 
does not matter much which model is used, but if one also is interested in 
the standard error of the estimate, the model choice greatly influences the 
results. 




Fig. 9. Estimated covariance function for the Kriging residuals using model P 1 . 
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Fig. 10. Estimates of the local mean (left) and standard deviation (right) for the Kriging 
residuals using model F 1 . The mean is fairly constant across the globe, whereas the standard 
deviation is higher close to the poles and at the international date line because of the 
temporal structure in the data. 

5.2. Discussion. Before the nested SPDE models were used on the ozone 
data, several tests were performed on simulated data to verify that the model 
parameters in fact could be estimated using the estimation procedure in Sec- 
tion 4. These tests showed that the estimation procedure is robust given that 
the initial values for the parameters are not chosen too far from the true 
values. However, for nonstationary models with many covariance parame- 
ters, it is not easy to choose the initial values. To reduce this problem, the 
optimization is done in several steps. A stationary Matern model (model A) 
is estimated to get initial values for reo,o, &o,o an d o 2 . To estimate model 
B, all parameters are set to zero initially, except for the parameters that 
were estimated in model A. Another layer of spherical harmonics is added 
to the bases for k 2 (s) and b(s) for estimating model E using the model B 
parameters as initial values. This step-wise procedure of adding layers of 
spherical harmonics to the bases is then repeated to estimate the larger 
models. Numerical studies showed that this optimization procedure is quite 
robust even for large models; however, as in most other numerical optimiza- 
tion problems, there are no guarantees that the true optimal values have 
been found for all models for the ozone data. 




0.99 0.995 1 1.005 1.01 0.85 1.12 1.41 1.7 1.98 



Fig. 11. The ratio between the kriging estimates using model F 1 and model M (left), and 
the ratio between the corresponding kriging standard errors (right). Note that there is not 
much difference between the Kriging estimates, whereas there is a clear difference between 
the corresponding standard errors. 
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The application of the nested SPDE models to ozone data was inspired 
by Jun and Stein (2008), who proposed using differentiated Matern fields for 
modeling TCO ozone, and we conclude this section with some remarks on 
the similarities and differences between the nested SPDEs and their models. 
The most general model in Jun and Stein (2008) is on the form 



where Xj, i = 0, 1, 2, are i.i.d. Matern fields in M 3 , Pi,i = 1, 2, 3, 4, are nonran- 
dom functions depending on latitude, l\ denoted longitude and fa denoted 
latitude. This model is similar to the model used here, but there are some 
important differences. First of all, (5.1) contains a sum of three indepen- 
dent fields, which we cannot represent since the approximation procedure in 
Section 3 in this case loses its computational benefits. To get a model more 
similar to the nested SPDE model, one would have to let P^fa) = 0, and 
Xo(s) = Xi(s). Using Xq = X\ or Xq and X\ as i.i.d. copies of a Matern 
field gives different covariance functions, and without testing both cases it 
is hard to determine what is more appropriate for ozone data. 

Another important conceptual difference is how the methods deal with the 
spherical topology. The Matern fields in Jun and Stein (2008) are stochastic 
fields on R 3 , evaluated on the embedded sphere, which is equivalent to using 
chordal distance as the metric in a regular Matern covariance function. One 
might instead attempt to evaluate the covariance function using the arc- 
length distance, which is a more natural metric on the sphere. However, 
Theorem 2 from Gneiting (1998) shows that for Matern covariances with 
v>l, this procedure does not generate positive definite covariance functions. 
This means that the arc-length method cannot be used for any differentiable 
Matern fields. On the other hand, the nested SPDEs are directly defined on 
the sphere, and therefore inherently use the arc-length distance. 

There is, in theory, no difference between writing the directional derivative 



of X(s) as (P 2 (fa)w- +- p 3(fe)sr)*i(s) or B(s) T VI(s), but the latter is 



easier to work with in practice. If a vector field basis is used to model B(s), 
the process will not have any singularities as long as the basis functions 
are nonsingular, which is the case for the basis used in this paper. If, on 
the other hand, P2{fa) and P%{fa) are modeled separately, the process will 
be singular at the poles unless certain restrictions on the two functions are 
met. This fact is indeed noted by Jun and Stein (2008), but the authors do 
not seem to take the restrictions into account in the parameter estimation, 
which causes all their estimated models to have singularities at the poles. 



(5.1) 



X(s) = Pi(/ 2 )X (s) + MMfe)^ + Pb(h) Qf )Xi(b) 
+ j P 4 (Z 2 )J-X 2 (s), 
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Finally, the nested SPDE models are computationally efficient also for 
spatially irregular data, which allowed us to work with the TOMS Level 2 
data instead of the gridded Level 3 data. 

6. Concluding remarks. There is a need for computationally efficient 
stochastic models for environmental data. Lindgren, Rue and Lindstrom 
(2010) introduced an efficient procedure for obtaining Markov approxima- 
tions of, possibly nonstationary, Matern fields by considering Hilbert space 
approximations of the SPDE 

( K (s) 2 -A) q/2 A(s)=</>(s)W(s). 

In this work, the class of nonstationary nested SPDE models generated by 
(2.8) was introduced, and it was shown how the approximation methods in 
Lindgren, Rue and Lindstrom (2010) can be extended to this larger class 
of models. This model class contains a wider family of covariance models, 
including both Matern-like covariance functions and various oscillating co- 
variance functions. Because of the additional differential operator C2, the 
Hilbert space approximations for the nested SPDE models do not have the 
Markov structure the model in Lindgren, Rue and Lindstrom (2010) has, 
but all computational benefits from the Markov properties are preserved for 
the nested SPDE models using the procedure in Section 4. This allows us to 
fit complicated models with over 100 parameters to data sets with several 
hundred thousand measurements using only a standard personal computer. 

By choosing C2 = b + B T V, one obtains a model similar to what Jun and 
Stein (2008) used to analyze TOMS Level 3 ozone data, and we used this re- 
stricted nested SPDE model to analyze the global spatially irregular TOMS 
Level 2 data. This application illustrates the ability to use the model class 
to produce nonstationary covariance models on general smooth manifolds 
which efficiently can be used to study large spatially irregular data sets. 

The most important next step in this work is to make a spatio-temporal 
extension of the model class. This would allow us to produce not only spa- 
tial but also spatio-temporal ozone models and increase the applicability 
of the model class to other environmental modeling problems where time 
dependence is a necessary model component. 

APPENDIX: VECTOR SPHERICAL HARMONICS 

When using the nonstationary model (2.8) in practice, we assume that the 
parameters in the model can be expressed in terms of some basis functions. 
If working on the sphere, spherical harmonics is a convenient basis for the 
parameters taking values in R. On real form, the spherical harmonic Yk^ m (s) 
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of order k € Nq and mode m ■ 



Y k ,m(s) 



'2k + l (fc-|m|)! 



-k, . . . , k is denned as 

V / 2sin(mZi)Pjt ) _ m (sinZ 2 ), 



Air (A; + \m\ 



— k < m < 0, 
m = 0, 
< m < k, 



i\ (sinZ 2 ), 

\/2cos(m/i)P fc:m (sin/ 2 ), 

where li is the latitude, l\ is the longitude, and P kjm (-) are associated Leg- 
endre functions. We, however, also need a basis for the vector fields Bj(s), 
determining the direction and magnitude of differentiation. Since the vector 
fields in each point on the sphere must lie in the tangent space of S 2 , the 
basis functions also must satisfy this. A basis with this property is obtained 
by using a subset of the vector spherical harmonics [Hill (1954)]. For each 
spherical harmonic Yfe im (s), k > 0, define the two vector spherical harmonics 



(s), 



Vs2ifc, m (s) x s - 



Here x denotes the cross product in M 3 and V§2 is the gradient on § 2 . 
By defining the basis in this way, all basis functions in T 1 = {Ti m } and 

T 2 = {T| m } will obviously lie in the tangent space of S 2 . It is also easy to 
see that the basis is orthogonal in the sense that for any k,l>0,—k<m<k, 
and —Kn<L one has 



(T 



k,mi ^ln)§ 2 



0. 

k(k + l)5 k _i8 n 
k{k + l)5 k _i5„ 



These are indeed desirable properties for a vector field basis, but for the 
basis to be of any use in practice, a method for calculating the basis func- 
tions explicitly is needed. Such explicit expressions are given in the following 
proposition. 



Proposition A.l. 

written as 



1 



k.m ' 



^fe,m( s ) 

where 



With s = (x,y,z^ T ^ 
to 



and Y 2 , m (s) can be 



— m(^) c k ^ rn xzY k _\ m {s) -\- kxz Y krn (s) 



Ck,m 

mxY k _ m (s) 

C k m 

,(1 - z 2 )n_ 1>m (s) - (1 - z 2 )kzY Km {s) 



Cfc,; 

yYk-i,m( s ) +mzxY k _ 
-kxzY kt m(s) + c k ^ m xY k ^ m {s) + myzY kj 
-m(l - z 2 )Y k _ m (s) 



m(s) 



(2/g + l)(fc 2 - 
2/c - 1 



m 
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Proof. One has that V§2Yfc im = P§2 (Vrs lfe, m ), that is, the gradient on 
§ 2 can be obtained by first calculating the gradient in ]R 3 and then projecting 
the result onto 8 2 . If denotes the normalization constant for the spherical 
harmonic Y kjm (s), and the recursive relation 

d 

(1 - z 2 )—P ky7n {z) = kzP k)Tn (z) - (k + m)P k ^ ltm (z) 

is used, one has that 

3 if c m \ 

— Yfc, m (s) = 1 _ z 2 (^ y fc,m(s) - + l"l|)-^-^-l,m(s)J. 



Now, using that tan(Zx) = x 1 y, one has 

2n x y 

- COS (/i 



dh 
dx 



or 1 — z z 
1 x 



where the last equalities hold on S 2 . Using these relations gives 



my 

T^7- 



'^fc,— m(s), 



5 



mx 



^fc,— m (s). 



Thus, with 



(fc + |m|)- 



(2fc + l)(fc 2 - |m| 2 ) 
2fc-l 



one has that 

V R 3y fem (s) 



-myY k - m (s) 
mxYfc _ m (s) 
^^fc.mXs) ~~ Cfc m Yfc_i m (s) 



Finally, the desired result is obtained by calculating 



k,m 



f iLm X S = Sx"f it 



X - 1 fc.m' 



where 

P § 2 = (/ - SS T ) 



1 — x —xy —xz 
—xy 1 — y 2 —yz 
—xz —yz 1 — z 2 





2 



-y x 



-z y 
-x 




□ 
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